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In a meta-analysis, it is important to specify a model that adequately describes 
the effect-size distribution for the underlying population of studies. The conven- 
tional normal fixed-effect and normal random-effects models assume a normal 
effect-size population distribution, conditionally on parameters and covariates, 
and assume that the mean (i.e., mode) describes a single overall effect-size. 
However, for skewed or more multimodal population effect-size distributions, 
the median better-indicates central-tendency, and multiple modes indicate the 
presence of multiple overall effect sizes in the population, not only one. To ad- 
dress such issues, we propose a Bayesian nonparametric meta-analysis model, 
which can describe a wider range of effect-size distributions, including unimodal 
symmetric distributions, as well as skewed and more multimodal distributions. 
The model encourages the inference of the whole effect-size distribution, includ- 
ing the mean, median, mode(s), and skewness, and allows the whole effect-size 
distribution to change flexibly with the covariates, while automatically identify- 
ing the significant covariates of the mean effect-size. We demonstrate our model 
through the analysis of real meta-analytic data arising from behavioral-genetic 
research, and through a simulation study. We compare the parameter estimates 
of the Bayesian nonparametric model, against the estimates of conventional and 
of more modern normal fixed-effects and random-effects models. 
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1 Introduction 



A research synthesis aims to integrate results from empirical research so as to produce 
generalizations (Cooper & Hedges, 2009). Meta-analysis, also referred to as the analysis 
of analyses (Glass, 1976), provides a quantitative synthesis of multiple research studies, 
where each study provides data for an effect-size statistic of interest, along with its sampling 
variance, and any covariate(s). Typical examples of effect-size statistics include the unbiased 
standardized mean difference between two independent groups (Hedges, 1981), among others 
(Konstantopoulos, 2007). Given a sample of effect-size data, the primary aim of a meta- 
analysis is to infer the overall effect-size distribution from the given study population, as well 
as to infer the heterogeneity of the effect sizes. A conventional summary of the overall effect- 
size distribution includes the mean, which is often referred to as the "overall effect-size." A 
conventional summary of heterogeneity is provided by the variance of the overall effect-size 
distribution. Heterogeneity can be further investigated though a meta-regression analysis, 
which studies how the mean effect-size relates to key study-level covariates (e.g., Berkey et al., 
1995; Thompson & Sharp, 1999; Thompson & Higgins, 2002; Higgins & Thompson, 2004). 
Importantly, a meta-regression analysis can be used to investigate and test for publication 
bias in the data, by relating effect-sizes with their standard errors (precisions; the square-root 
of the effect-size variances) (Thompson & Sharp, 1999). This actually provides a regression 
analysis for the funnel plot (Egger et al., 1997). 

The normal fixed-effects model and the normal random-effects model provide two tradi- 
tional approaches to meta-analysis (Hedges & Vevea, 1998; Konstantopoulos, 2007; Boren- 
stein et al. 2010). Each model is a weighted linear regression model, which treats study- 
reported effect-size as the dependent variable, which weighs each reported effect-size by the 
inverse of its sampling variance, assumes normally-distributed regression errors, and rep- 
resents the mean study effect-size by the intercept parameter. The normal random-effects 
model is a two-level model which allows for between-study variance in the effect-sizes, through 
the addition of random intercept parameters that are assumed to be normally distributed over 
the given study population. Specifically, effect-sizes at the first level are nested within stud- 
ies at the second level. A three-level meta-analysis model can accommodate data structures 
where, say, studies (level 2) are nested within school districts (level 3) (Konstantopoulos, 
2011). Given a sample set of meta-analytic data, the parameters of a normal fixed-effects 
or normal random-effects model can be estimated via maximum-likelihood. Alternatively, a 
Bayesian inference approach can be taken, which involves the specification of a prior distri- 
bution on all parameters of the given meta-analytic model (e.g., Higgins et al., 2009). Then 
Bayesian inference of the data is based on the posterior distribution of the parameters. This 
distribution is formed by combining the data (likelihood) information of the given model, 
with the information from the prior distribution on all model parameters. 

The conventional normal fixed-effects and normal random-effects models each assume 
that the effect-size distribution is a unimodal and symmetric, normal distribution, condi- 
tional on all model parameters and on all covariates of interest. The overall effect-size 
distribution arises from a specific choice of the covariates (more details later), and the mean 
of this distribution is the overall effect-size. 

We agree that either a normal fixed-effects model, or a normal random-effects model, 
can provide a reasonable model for the data, provided that the sample data arise from a 
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unimodal and symmetric effect-size distribution for the given underlying study population, 
conditional on any set of covariates of interest. Specifically, under such a model, and if 
the overall effect-size population distribution is unimodal and symmetric, then the mean 
provides a reasonable summary of the overall effect-size (and central tendency) because it 
coincides with the mode of this distribution, among other things. 

However, in the literature on meta-analytic modeling, there has been increased discus- 
sion that the mean can provide a misleading summary of a overall effect-size, and that 
meta-analysis should be based on the inference on the whole effect-size distribution for the 
underlying study population (Higgins et al., 2009; see also Davey Smith, et al., 1997). Here, 
we expand on this argument, by using conventional statistical arguments that the mean does 
not necessarily provide a meaningful summary of central-tendency, for skewed and more mul- 
timodal distributions. Specifically, for a unimodal and skewed effect-size distribution, the 
mean does not necessarily coincide with the median or the mode, and the median provides a 
more appropriate summary of the distribution's central tendency because it is more robust 
to skewness, compared to the mean. For an effect-size distribution that explicitly displays 
multiple modes, the mean may no longer coincide with any of the modes and may even have 
relatively low probability in this distribution. Moreover, the presence of multiple modes 
indicate the presence of multiple overall effect sizes in the distribution (i.e., effect-size values 
located at these modes), not only one overall effect-size. So in summary, we are concerned 
that, in frequent meta-analytic settings where sample effect-size arise from a skewed and/or 
more multimodal population distribution, the mean of this distribution may not provide 
an optimal summary for the practice of meta-analysis, which aims to accumulate evidence 
about particular research issues, of interest to stakeholders, policy makers, and substantive 
researchers. In other words, for such a population distribution, the mean can provide a 
misleading summary under either a normal fixed-effects or a normal random-effects model. 

To address all the issues stated above, we propose a more flexible, Bayesian nonparametric 
model for meta-analysis, which is useful for the analysis of skewed and/or more multimodal 
effect-size distributions, as well as for the analysis of unimodal and symmetric distributions. 
The model is a special case of the general model introduced by Karabatsos and Walker 
(2012), which was studied in more general regression settings. The new model is a type 
of random-effects meta-analytic model, which specifies the effect-size distribution by an 
infinite random-intercept mixture of normal distributions, conditional on any covariate(s) of 
interest, with covariate-dependent mixture weights. Therefore, the model is flexible enough 
to describe a very wide range of effect-size distributions, including all normal distributions, 
as well as all (smooth) distributions that are more skewed and/or multimodal. Therefore, 
the model avoids the empirically-falsifiable assumption that effect-sizes arise strictly from 
symmetric unimodal distributions, such as normal distributions. Furthermore, the model's 
high flexibility encourages a rich and graphical inference of the whole effect-size distribution, 
as previously recommended for meta-analytic practice (Higgins et al., 2009). 

Also, in the spirit of meta-regression analysis, the Bayesian nonparametric meta-analysis 
model allows the whole effect-size distribution to change flexibly and non-linearly as a func- 
tion of key study- level covariates of interest. This feature permits a rich and flexible meta- 
regression analysis. Moreover, for a given meta-analysis, this model can automatically iden- 
tify significant covariates, that is covariates that significantly predict changes in the mean 
effect-size, in a model-based and non-ad-hoc fashion. Specifically, the model makes use of 
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spike-and-slab priors for the regression coefficients that allow for automatic covariate (predic- 
tor) selection in the posterior distribution. Such priors were developed for Bayesian normal 
linear regression models (George & McCulloch, 1997). Other flexible Bayesian nonparamet- 
ric models have been proposed for meta-analysis (Burr & Doss, 2005; Branscum & Hanson, 
2008), but they do not address covariate information. Moreover, under either a normal 
fixed-effects or normal random-effects model inferred under a non-Bayesian (Frequentist) 
framework of maximum-likelihood estimation, the identification or selection of significant 
study-level covariates (predictors) is challenging because it deals with the standard issues of 
multiple hypothesis testing over predictors (e.g., Thompson & Higgins, 2002). The often- 
used stepwise procedures of covariate selection are known to be ad-hoc and sub-optimal. 

The Bayesian nonparametric model is " nonparametric" in the sense that it assigns a 
prior distribution to infinitely-many (or very many) model parameters (Miiller & Quintana, 
2004), for the purposes of defining a very flexible model for describing a wide range of data 
distributions, including unimodal distributions, and more multimodal distributions. Thus, 
the term "nonparametric" is actually a misnomer (Miiller & Quintana, 2004). Nevertheless, 
a Bayesian nonparametric model avoids the more restrictive assumptions of "parametric" 
models, namely, that the data distribution can be fully- described by few finitely-many pa- 
rameters. For example, a model that assumes a normal distribution, assumes that the data 
distribution can be wholly described by two parameters, namely mean and variance. This 
implies the assumption that this distribution is unimodal and symmetric. 

We now describe the layout of the sections in the rest of the paper. Since the paper 
covers various key statistical concepts, it is necessary to first give them a brief review in 
Section 2. In that section, we review the basic data framework of meta-analysis, including 
effect sizes. Then we review the conventional normal fixed-effects and normal random- 
effects models, the Bayesian inference framework in Section, the key ideas of Markov Chain 
Monte Carlo (MCMC) estimation methods for Bayesian estimation, and the traditional 
Bayesian approaches to meta analysis based on conventional normal fixed-effects and normal 
random-effects meta-analytic models. Then, we review some of the key concepts of discrete 
mixture modeling, which underlies the Bayesian nonparametric (infinite-mixture) models. 
Then in the third section, we describe our new Bayesian meta-analysis model. Given a 
set of data, the posterior distribution of the model parameters is estimated using Markov 
chain Monte Carlo (MCMC) methods, which are described in Appendix B. In the fourth 
section, we review a standard criterion for comparing the predictive performance between 
different Bayesian models that are fit to a common data set, for the purposes of identifying 
the single model that has best predictive-fit, i.e., of identifying the single model that best 
describes the underlying population distribution of the sample data. In this study, we use 
the criterion as a diagnostic for the detection of skewed and more multimodal effect-size 
distributions, from sample data. The fifth section illustrates our Bayesian meta-analysis 
model through the analysis of simulated data, and the analysis of a large meta-analytic data 
set of behavioral genetic studies, involving 24 covariates (Talbott et al., 2012). For each of 
these data sets, that section describes the results of comparisons, between our model, and 
versions of conventional and more modern normal fixed-effects and normal random-effects 
models. The last section ends with conclusions. 
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2 Review of Meta-Analytic Modeling Concepts 



Before we review the key concepts underlying the various approaches to meta-analysis, we 
describe some notation that we will use in the remainder of this paper. Following the standard 
notation of statistics, ~ will mean "distributed as"; n(-\^,v) denotes the ("bell-shaped") 
probability density function (p.d.f.) of the normal distribution having mean and variance 
(/i, v); the p.d.f. of the n-variate normal distribution with mean vector /x and (symmetric 
and positive-definite) variance-covariance matrix S is denoted by n n (-|^t, E); the p.d.f. of a 
gamma distribution with shape and rate parameters (a, b) is denoted by ga(-|a, b); and the the 
p.d.f. of a uniform distribution with minimum and maximum parameters (a, b) is denoted by 
un(-|a,6). Finally, 5q(-) denotes the degenerate distribution that assigns probability 1 (full 
support) to the number 9. 



2.1 Data Framework of M eta- Analysis 

In a typical meta-analysis context, data are available on n study reports, indexed by 
i = 1, ... ,n. Each study report provides information about an effect-size i/i of interest, on 
the basis of rij observations, and provides information about p study characteristics which 
are described by p covariates Xj = (1, xu, . . . , x P i) J , including the constant (1) term for future 
notational convenience. A full meta-analytic data set is denoted by V n = {(y i; Xj)}™ =1 . 



Enter Table 1 



Table 1 presents some typical examples of effect-size statistics that are often used in meta- 
analysis, along with their sampling variances (of) (Konstantopoulos, 2007; Borenstein, 2009, 
Fleiss & Berlin, 2009). In general, let ~S n = (au) nX n denote the sampling covariance matrix 
for a set of n study reports, having diagonal elements an = a?, and where each off-diagonal 
element of the matrix indicates the sampling covariance between a given effect-size pair (?/j, y{) 
(Gleser & Olkin, 2009). To ease the presentation of the various meta-analytic models we 
describe in the subsequent sections, we will assume a diagonal matrix £ n = diag(a^, . . . , <r^), 
as is commonly done in the practice of meta-analysis. It is straightforward to extend the 
meta-analytic models to the case of non-diagonal covariance matrix ~S n (but such extensions 
will clutter the notation). However, as we discuss in Section 3, this assumption can be made 
under the Bayesian nonparametric meta-analytic model, with no loss of generality. 



2.2 Traditional Meta-Analytic Models 

A traditional meta-analysis model assumes that, for a given set of data T> n = {(?/«, Xj)}" =1 , 
the effect-size distribution follows the general form: 

f(yi\xuC) = n(^|xJ/3 + /i 0i + /i 0M(i) ,o\ 2 ), i = l,...,n; (la) 

xJ/3 = (3 + H h (3 p x ip ; (lb) 

(/%> • • • ; / i o«)l s o ~ n n (0,a 2 I n + ^M n ); (lc) 

A%kI a oo ~ n (°> ^oo); t = 1, . . . ,T. (Id) 
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Typical meta-analytic models are special cases of the general normal model shown in equation 
(1). In all, the general traditional model (1) has likelihood density f(y\x; £) with parameters 
C = (/3, At , /x 00 , <7q, <Tq , ipy , which are to be explained now. 

The intercept parameter /3 is interpreted as the mean effect-size over the given popu- 
lation of studies (Louis & Zelterman, 1994). This interpretation holds true, provided that 
each of the p covariates has data observations (x/d, . . . , £fc n ) T that have already been cen- 
tered to have mean zero, as we assume throughout. Also, the p covariates are respectively 
parameterized by linear slope coefficients . . . , (3 p ). 

Also, /x = (/i 01 , . . . , « 0n ) T are the level-2 random intercept parameters. Similarly, /x 00 = 
(/i 001 , . . . , « ot) t are the level-3 random intercept parameters, with each of the n study reports 
being nested within exactly one of T < n study reports (t = 1,...,T), and with n QOt ^ 
meaning that the level-3 intercept gned to the ith study report. As shown in the 

model equations (lc) and (Id), the random intercepts fi and /x 00 are each assumed to have 
a multivariate normal distribution (probability density). 

A normal fixed- effects model assumes that all the random intercept parameters (/x , /x 00 ) 
are zero. In terms of the general normal meta-analytic model (1), this corresponds to the 
assumption of zero variances, i.e., <Tq = <Tq = ip = (Hedges & Vevea, 1998). A 2- 
level normal random- effects model allows for non-zero random intercepts /x , by allowing 
for nonzero variances (al,ip), as shown in equation (lc). Typical normal random-effects 
models assume that the random intercepts /x are uncorrelated with n-variate normal density 
n n (^ |0, crpn) (Hedges & Vevea, 1998). However, it is possible to model correlations between 
the level-2 random intercepts /x . For example, Stevens and Taylor (2009) consider a 2-level 
normal random-effects model which assumes the n-variate normal density n n (/i |0, So) f° r 
the level-2 random intercepts, using a more general covariance structure S = crpn + ^M n , 
where the parameter ip represents the covariance between pairs of study reports. Also, M n 
is a fixed n x n indicator (0-1) matrix, with Is specified in the off-diagonal to reflect a-priori 
beliefs as to which pairs of the n study reports have correlated level-2 random intercepts, and 
with zeros specified for all the other entries of M n . Finally, a 3-level normal random- effects 
model also allows for non-zero random intercepts ^i 00 , by allowing for a positive variance 
<Jq , as shown in equation (Id) of the general normal model (e.g., Konstantopoulos, 2011). 

The general normal model described in equation (1) can be written explicitly as a normal 
mixture of multivariate normal model: 

f(yi\xi;f3,a 2 ,ijj,al ) = j n n (y|X/3 + )dGV ,/x 00 ), 

for effect-size data y = (yi, ...,y n ) J and given the matrix X of row vectors xj, i = 1, . . . , n. 
Specifically, the mixture model presented above assumes that the mixture distribution G(/x , /x 00 ) 
is a multivariate normal distribution with probability density n n (/^ |0, cjQl n +'?/ ; M n )nT(M 00 |0, al I n ). 

For any of the models described in this section, full maximum likelihood methods can be 
used estimate the parameters £ = (/3, /x , /x 00 , o"q, <Tq , ip) J , from a given data set V n . Alter- 
natively, the parameters of a random-effects model can also be estimated by the restricted 
maximum-likelihood method, which focuses estimation on the variance parameters (Harville, 
1977). The technical details of these maximum-likelihood methods are well-documented (e.g., 
Raudenbush & Bryk, 2002, Ch. 3, 13-14; Stevens & Taylor, 2009). 
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For any one of the individual models that is described in this section, the estimate of 
the effect-size distribution of the underlying population is given by the density / n (j/|x ; C) — 
n(y\(3 ,a 2 + <Tq + (Xq ) of the normal distribution, given a maximum-likelihood estimate £ 
obtained from a sample data set T> n , and after controlling for all p covariates via the covariate 
specification x = x = (1, 0, . . . , 0) T . 

As an alternative to the maximum-likelihood approach, parameter estimation can be 
performed in a fully-Bayesian framework. The Bayesian framework is based on the speci- 
fication of a prior density (distribution) 7r(£) on the parameter space f2<j = {(,} C M dim ^, 
where dim(£) denotes the dimensionality of £ (e.g., Higgins et al., 2009). Next, we pro- 
vide an overview of the Bayesian inference framework. Then we show how the framework 
is typically applied for statistical inference with the general normal meta-analytic model of 
equation (1). 



2.3 Review of Bayesian Inference 

For a general statistical model, let /(y|x; £) denote the likelihood density of the data 
point y, given covariates x and model parameter £, which has space Q^. Under the model, a 
given data set V n = {(?/i, Xj)}™ =1 has likelihood L(V n ; £) = Yl7=i /(2/j| x ij 0- I n the Bayesian 
approach to statistical inference, the model parameter £ is assigned a prior probability density 
7r(£) on the parameter space Q^, and therefore the approach also treats the model parameter 
( as a random variable. The prior density reflects prior (pre-experimental) beliefs about the 
plausible values of the parameter, for the data set T> n at hand. 

According to standard arguments of probability theory involving Bayes' theorem, the 
model's data likelihood L(T> n ; £) combines with the prior density 7r(£), to yield a posterior 
density for £, defined by: 

v (t\Ty\ LjVnXMC) Iir=i f(ui\*i'i CMC) (9) 

^ n) J n( L(v n x)du(c) / nc nr=i/(^l^;C)dn(c)' u 

where IT(£) denotes the cumulative distribution function (c.d.f.) of the prior density 7r(£). 
The posterior density describes the plausible values of the model parameters £, given prior 
beliefs that are reflected by the prior density vr(^), and given the data T> n having information 
that is summarized by the likelihood L(V n ; £). 

Clearly, the specification of the prior density vr(C) is an important step in a Bayesian 
analysis, and the posterior ir(C\V n ) can be quite sensitive to the chosen prior in situations 
where the sample size (n) is not large. Also, often in the practice of Bayesian data analysis, 
there is a lack of prior information about the parameters £ of the given model. This lack 
of prior information is reflected by a "diffuse" prior probability density 7r(£) that has high 
variance, and assigns rather-equal but broad support over the parameter space Q^. Such 
priors are often referred to as "non-informative", even though technically speaking, a prior 
cannot be fully non-informative. As a consequence of specifying such a diffuse prior 7r(£), the 
posterior density 7i(£\U n ) becomes mostly determined by the data V n likelihood £(£>„;£), 
relative to the prior. As we discuss later, such rather non-informative priors are commonly 
specified in the practice of Bayesian meta-analysis. 

Prediction provides a basic function of statistical modeling, and from a Bayesian model, 
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predictions are made on the basis of its posterior density ir(£\V n ). Specifically, the posterior 
predictive density of Y given x is given by: 



For the purposes of meta-analysis, the posterior predictive density f n {y\x) provides an esti- 
mate of the true effect-size density (distribution) for the underlying study population, given 
sample data T> n and covariates x of interest, under squared-error loss (Aitchison, 1975). 
Hence, for the purposes of a Bayesian meta-analysis, this density provides inference of the 
whole effect-size distribution. 

2.4 Review of Markov Chain Monte Carlo (MCMC) 

In most Bayesian models, C, is a high- dimensional parameter vector, and the direct evalu- 
ation of the posterior equations (2) and (3) require prohibitive high-dimensional integrations. 
However, in such a situation, one may use methods of Markov chain Monte Carlo (MCMC) 
to estimate such posterior densities of the given model. In the typical use of MCMC, the 
parameter vector £ is first broken up into B non-overlapping blocks (£ x , . . . , C, B ). Then the 
full conditional posterior densities 7r(Ci|f n> ■")>•••> 7r (Csl^ , n, ■ - • ), in turn, are sampled at 
each stage of an MCMC algorithm, where each full conditional is sampled through the use of 
a Gibbs sampler, Metropolis-Hastings algorithm, or other sampling algorithm (e.g., (Gelfand 
& Smith, 1990; Chib & Greenberg, 1995). The 5-step sampling algorithm is repeated a large 
number S of times, to construct a discrete-time Harris ergodic Markov chain {C^}f=i hav- 
ing the posterior distribution n(£|X> n ) as its stationary distribution (for related definitions, 
see Meyn & Tweedie, 1993; Nummelin, 1984; Roberts & Rosenthal, 2004). Harris ergodic- 
ity guarantees good "mixing" in the sense that the Markov chain thoroughly explores the 
support of the posterior distribution, asymptotically (S — > oo). This ergodicity condition 
is ensured by a proper prior distribution and the correct implementation of an MCMC al- 
gorithm (Robert & Casella, 2004, Section 10.4.3). Also, to generate samples {y pred ^|x}f =1 
from the posterior predictive density /n(2/ pred ^|x) given covariate x of interest, another step 
may be added for each stage s of the MCMC algorithm, which samples from /(|/ pred |x; C^). 

In the practice of MCMC, S can only be finite, and therefore it becomes important to de- 
cide on a sufficiently large S that produces MCMC samples {C^"* }f=i which well-approximate 
samples from the true posterior density II(£|X> n ), and produces MCMC samples {?/ prcd |x}f =1 
which well- approximate samples from the true posterior predictive density f n (y\x), given co- 
variates x of interest. According to textbook recommendations (Geyer, 1992; 2011, Chapter 
1), one should perform one very long MCMC run, i.e., a run with very large S, followed 
by an inspection of trace plots of MCMC samples of several one-dimensional parameters 
of interest for data analysis. The trace plots are used to confirm that the MCMC samples 
of the one-dimensional parameters appear to mix well, i.e., thoroughly explore the support 
of its marginal posterior distribution, without strong correlation over these MCMC sam- 




(3) 



and this density has posterior predictive mean (expectation) and variance (Var): 



E n (F|x) = fyf n (y\x)dy, Var n (F|x) = f{y - E(Y\x)} 2 f n (y\x.)dy . 
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pies. Then after confirming the adequate mixing of the MCMC samples, the 95% Monte 
Carlo confidence interval (MCCIs) of each of these univariate parameters is computed. If 
the MCCIs are small enough for practical purposes, then the chosen number S of MCMC 
sampling iterations is decided to be large enough. If the MCCIs are not small enough, then 
the MCMC algorithm for a larger number of MCMC sampling iterations, until the MCCIs 
become sufficiently small. 

The MCCIs of posterior moments (e.g., means, variances, etc.) can be consistently 
estimated though the use of specific batch-means methods (see Jones et al., 2006,). Specif- 
ically, let <p(C) denote a one-dimensional parameter of the model parameters more gen- 
erally, <p(C) can just be a one-dimensional functional of interest. Suppose that the output 
{C^}f=i °f the sampler is broken into batches of equal size that are assumed to be ap- 
proximately independent, and let S = asbs- Also, suppose that {v(C )}f=i are the S 
MCMC samples that are generated for the given functional, and it is of interest to con- 
struct a given posterior moment estimate y$ n (C) of f(C)t such as a posterior mean es- 
timate <Pn(C) = E[v?(C)|^n] = £ J2s=i f(C^)j or a posterior variance estimate </3 n (C) = 
Var^(C)|£>„] = § Ef=i(^(C (s) ) - Eb(C)pn]) 2 - Then a consistent estimate of a 95% MCCI 
around the posterior moment estimate is given by y$ n (C) ±t*(&s/ S 1 ^ 2 ), where t* is the .975th 
quantile (97.5%ile) of the student(0, 1, as — 1) t-distribution with as — 1 degrees of freedom, 
and where a 2 s = (bs(as-l)) Y^=i(&n '(C) — ^n(C)) 2 ; ¥n\C) is the posterior moment estimate 
of batch I, where bs = floor(S' 1 / 2 ) and as = floor(Syfes). The same procedure can be applied 
to MCMC posterior predictive samples {y pred |x}f =1 , to estimate the 95% MCCI around a 
posterior predictive moment estimate. 

2.5 Review of Bayesian Normal Meta-Analytic Models 

In relation to the Bayesian inference framework described in Section 2.3, consider the 
general normal meta-analytic model (1), which has likelihood density given by (la), and 
which has parameters £ = (/3, 7, /j, , /x 00 , cx 2 ,, ctq , ip) J . Here, 7 = (7i,---,7 p ) T are included 
as parameters which respectively indicate (0-1) whether or not the p covariates are included 
(7 = 1) or excluded (7 = 0) from the model. In typical practice involving such a model 
(including special cases), the prior has the general form: 

tt(C) = n(/3|0, V 7 )7r(7)n n (/x |0, a 2 I n + ^M n )n T (/x 00 |0, a 2 )7r(a 2 , ^)7r(a 2 ). (4) 

In Bayesian meta-analytic modeling, it is common practice to specify a diffuse normal prior 
density for the coefficients j3 by taking V 7 = vl p+ i, with v — > 00 (e.g., DuMouchel & 
Normand, 2000), with the implicit assumption that 7r(7 = 1) = 1. 

Also, it is common in the practice of Bayesian random-effects modeling to attempt to 
assign a non-informative prior for (cr 2 ,, o"q ) via the specification of inverse-gamma priors 
7t(<7q) = ga(cjQ 2 |e, e) and vt(ctq ) = ga(<jQ 2 |e, e), for small choice of constant e > (Gelman, 
2006), implying a prior density of the form 7r(cr 2 ) ,'i/0 = 7r ( cr o)^o(V ; )- Though, recall from 
Section 2.2 that a more general multivariate normal n n (/x |0, crpn + "0M n ) mixture distri- 
bution can be specified (Stevens & Taylor, 2009). This mixture distribution prior allows for 
correlated level-2 random intercepts /x via a parameter ip that measures the covariance be- 
tween specific pairs of the total n study reports, and where M n = (mu) nxn is a fixed matrix 
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which indicates (0-1) which pairs of the study reports are expected to yield correlated level-2 
random intercepts fj, , with zeros in the diagonal. For the parameters (<Tq, ■?/>), Stevens and 
Taylor (2009) propose the rather non-informative prior density 

vr(a 2 , 4j) = (c /(c + a 2 ) 2 )un(^| - a 2 /(K - 1), a 2 ), (5) 

where the first term the product gives a log-logistic prior density for <Tq, where cq = 
{n/tr([diag(S ra )]~ 1 )} 1 / 2 is the harmonic mean of the sampling variances a 2 {% = 1, . . . , n), 
and K = maxj{^" =1 mu} is the largest group of related study reports. This log-logistic 
prior is right-skewed, highly-dispersed, with quartiles (c /3, c , 3c ). 

Simpler versions of the general normal random-effects model (1) can be specified via 
appropriate straightforward modifications of the prior density (4). A Bayesian 2- level normal 
random-effects model which assumes a^ Q = (i.e., /x 00 = 0), but allows for correlated level-2 
random intercepts fi assigns the prior density 7t(ctq ) = 5 ( cr oo); a 2- level normal random- 
effects model which assumes a^ = ip = (i.e., ^ 00 = 0) and assume independent level-2 
random intercepts /x assigns the prior density ii(al Q ,ip) = ^o^oo) ^oC0)j an d a fixed-effects 
model which assumes <Tq = <Tq = ip = (i.e., fx = and ^ 00 = 0) assigns the prior density 
ir(a 2 ,a 2 Q0 ,i/;) = 5 Q (a 2 )5o(a 2 )5 Q (^). 

Other simple modifications of the prior density (4) can be used to specify other impor- 
tant versions of the general normal random-effects model (1), which consider different priors 
for the random intercepts (/j, ,fi Q0 ). For example, Gelman (2006, Section 7.1) notes that 
posterior inference of the parameter (Jq 2 , under the often-used " non- informative" gamma 
ga(<TQ 2 |e,e) prior, is very sensitive to the choice of small e, especially when the data sup- 
port small values of <Tq. Similarly for the gamma ga(<Jo 2 |e, e) prior for the parameter <Tq . 
Therefore, in a situation where there is little prior information available about the param- 
eters (ctq,(Jq ), he recommends other priors as alternatives to the gamma ga((Xo 2 |e, e) prior. 
Specifically, he recommends the specification of uniform prior densities 7r(cro) = un(cr |0,&o) 
and 7r(cr o) = un(cr o|0, 6oo) f° r reasonably-large values (b ,b 00 ), whenever n and T are both 
at least 5. When there is more prior information is desired, say when n is less than 5, he 
recommends the half-t prior density of the general form vr(o"o) oc (1 + 1 (o"o/feo) 2 ) ~( a+1 )/ 2 , 
and similarly for the level-3 variance parameter cr o- 

Finally, while it is common practice to assume a diffuse prior for the regression coefficients 
f3 = (f3 Q , j3 1 , . . . , (3 p ) J , along with ix{~{ = 1) = 1, in principle one may specify spike- and- slab 
priors for the slope parameters (/3 1; . . . , /3 p ) in order to enable automatic variable (covariate) 
selection via posterior inference (George & McCulloch, 1997), along with a diffuse prior 
(3 ~ n(0, v — > oo). These spike-and-slab priors are defined by independent normal and 
Bernoulli prior densities for the parameters, so that the n(/3|0, V('y))7r(7) prior in (4) is 
based on: 

V 7 = diag(t; ->■ oo,v (l - 7i) + Ui7i» • • • iM 1 ~ 7 P ) + v iJ P ) (6a) 

^) = nL pr(7fc=i)7fc[i ~ pr(7fc=i)]l_7fe ' (6b) 

where Vq is a small prior variance (e.g., Vq = .001), v\ is a large prior variance (e.g., Vq = 10), 
and Pr(7 fc = 1) is the Bernoulli probability parameter that is often set to .5 in practice 
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(George & McCulloch, 1997). So on the one hand, with prior probability Pr[7 fc = 1] = .5, 
the kth covariate is included in the model gnificant" predictor of the effect-size, by- 

assigning its regression coefficient (3 k a normal n(/3 fc |0, v-y) prior density that supports a large 
range of j3 k values. On the other hand, with prior probability Pr[7 fc = 0] = .5, that covariate 
is excluded from the model, by assigning its regression coefficient /3 k a normal n(/? fe |0, Vq) 
prior that places all its support on values j3 k « 0. Given MCMC samples from the posterior, 
7r(£|D n ), the kth covariate can be viewed as a "significant predictor," when the posterior 
inclusion probability of the covariate, Pr[7 fc = l|X> n ], is at least .5 (Barbieri & Berger, 2004). 
We assume throughout that the spike- and- slab prior specifications are given by v = .001 
and v\ = 10. These specifications are consistent with the previous recommendation that the 
ratio vi/vq be no greater than 10,000, for the purposes of reliably estimating the parameters 
(/3,7) via MCMC methods (George & McCulloch, 1997, p. 368). 

For any one of the models described in this section, the posterior predictive density 
/ n (y|xo) provides and estimate of the "overall" effect-size distribution of the underlying 
population, and is a symmetric and unimodal distribution, conditionally on covariates x = 
xo = (1, 0, . . . , 0) T . For example, under a Bayesian normal fixed-effects model, the posterior 
predictive density / n (y|x ) estimate of the overall population effect-size distribution is a 
normal density (distribution). Under a Bayesian normal 2- level random-effects model with 
ga(<TQ 2 |a, b) prior for independent level-2 random intercepts fx , the posterior predictive 
estimate f n (y\xo) of the population effect-size distribution is given by a student density 
(distribution) (e.g., Denison et al. 2002, Appendix). The student distribution is very similar 
to a normal distribution, except that the student distribution has thicker tails. 

For the Bayesian (general) normal meta-analytic model, Appendix A lists all the full 
conditional posterior distributions that are needed to perform MCMC sampling, in order 
to estimate the posterior density 7r(/3, 7, n , fx 00 , <Tq, <Tq , ip\V n ), and to estimate the corre- 
sponding posterior predictive density f n (y\x) for covariates x of interest, as well as to perform 
inference for all other relevant posterior functionals of interest. The full conditionals pos- 
terior distributions of the parameters are based on blocks of parameters that are given by 
(/3, 7, fx , fx 00 , <7q, cJqq, ip). The MCMC algorithm in Appendix A is general in the sense that 
it can be applied to estimate any version of the general normal random-effects model (1), 
with prior of general form (4), as discussed this section. 

2.6 Review of Discrete Mixture Modeling 

Statistical models which assume normal (and normal-like) probability distributions, that 
is distributions corresponding to probability densities that have a specific unimodal and 
symmetric form, can be useful for applied statistics. However, such normal models are ill- 
equipped to fully describe distributions that are more multimodal, including skewed and 
heavier-tailed distributions. A discrete mixture model, which is defined by a mixture of 
distributions via a weighted mixture of probability densities, can capture a wider range of 
distributions; including unimodal-symmetric distributions, unimodal heavy-tailed distribu- 
tions, unimodal skewed distributions, as well as more multimodal distributions. 

In general, a discrete mixture model defines a density /(|/|x; £) that has has general form 
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(e.g., McLachlan & Peel, 2000): 




given model parameters £ = (SI/,^, J), mixing distribution G x which may depend on covari- 
ates x, component indices j = 1, . . . , J, with J the number of mixture components, along with 
component (kernel) probability densities /(y|x; ^(x)) (j = 1, . . . , J), component parameters 
= (^j(x))j =1 (being members of £), and mixing weights (^'(x;C w ))/=i which sum to 1 at 
every at every x 6 /f and depend on parameter A Bayesian discrete mixture model is 
completed by the specification of a prior density 7r(£) over the parameter space S\ = {£}. 
Then following Bayes' theorem, the data likelihood L(V n ) = YYi=i f(yi\ x ij updates the 
prior density 7r(C), to a posterior density 7r(£|X> n ), given by equation (2). It then follows 
that the posterior predictive density is given by equation (3). 

It is known that any smooth probability density can be approximated arbitrarily-well 
by a suitable discrete mixture of normal probability densities (the component densities), 
conditional on any covariates x of interest. Here, "suitable" refers to the appropriate spec- 
ification of the number of components J, of the mixing weights and of a 
prior density 7r(£) that has sufficiently-broad support over the parameter space fl^. Hence, 
an appropriately-specified discrete mixture model, defined by a mixture of normal distribu- 
tions, supports the space of unimodal-symmetric distributions, as well as skewed and more 
multimodal distributions. 

In the next section, we present a Bayesian nonparametric model for meta-analysis, which 
is a special case of the Bayesian nonparametric regression model that was introduced by 
Karabatsos and Walker (2012). The model is defined by an infinite-mixture of normal 
densities (distributions), conditional on each covariate x. The specification of an infinite 
(i.e., J = oo) mixture, along with normal (kernel) probability densities, along with a prior 
7r(£) with sufficiently-broad support on the parameter space, enables the model to support a 
very wide range of smooth probability densities (distributions), conditional on any covariates 
x of interest. In other words, the model assumes an infinite number of mixture components 
J, assumes normal component densities, and as will be shown later, specifies mixture weights 
according to an (infinite) ordered probits regression model. The model has infinitely-many 
parameters, because it has infinitely many mixture components. 

3 The Bayesian Nonparametric Meta- Analysis Model 

For effect-size data, our Bayesian nonparametric meta-analysis model is a type of Bayesian 
random-effects model, defined by an infinite random-intercepts mixture of regressions. The 
model assumes the data likelihood: 

oo 

fiVil^h = n (y^oj + X J A <0l Vi(X T /3 w , <r u ), i = l,...,n. (8) 

j'=-oo 
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The model specifies covariate-dependent mixture weights Wj(xJ/3 w , a u ), each defined by the 
difference between two standard Normal(0,l) cumulative distribution functions (c.d.f.s): 

ujjix}^, a u ) = - y$0 u }/a u ) - - 1 - x[/3JK). 

Actually, the mixture weights are categorical probabilities of a cumulative-probits regres- 
sion model (e.g., McCullagh, 1980), for infinitely- many ordered categories indicated by 
j — 0, ±1, ±2, Obviously, the mixture weights Uj(yiJfl ul , a w ) sum to 1 at every x G X . 

— Enter Figure 1 — 

As shown in equation (8), for a set of data V n = {{yi, Xj)}™ =1 , the model assumes that 
each effect-size yi is distributed by according to a probability density /(2/i|xj;£) that is 
constructed by a mixture of an infinite number of normal densities n(z/j|// --l-xJ/3, (jxrf), having 
corresponding means /i - + xJ/3 and mixture weights Wj(xJ/3 w , a J) (for j = 0, ±1, ±2, . . .). 
Therefore, given covariates x, the model is flexible enough to allow the shape of the effect- 
size distribution (density) /(y|x) to take on virtually any form; this density can be unimodal 
symmetric, or skewed, or more multimodal. Moreover, the model allows the entire shape and 
location of the effect-size distribution (density) /(y|x) to change flexibly with the covariates 
x. The model has these flexibilities, because it models the effect-size density /(y|x) by 
infinitely-many random intercept parameters /x = {l^oj : J = 0, ±1, ±2, . . .}, corresponding 
to infinitely-many covariate-dependent mixture weights {^-(xj/^, aj) : j = 0, ±1, ±2, . . .}. 

In the Bayesian nonparametric model, the parameter controls the level of multimodal- 
ity of f(y\x). To explain, assume for the moment that xJ/3 = 0, for simplicity, and with no 
loss of generality. On the one hand, a small value of indicates that /(y|x) is unimodal, 
i.e., modeled as a unimodal normal density n(yi\/ij, of) for a j satisfying j — 1 < x 1 ^ < j, 
with mixture weight Uj(x J /3 u , a w ) near 1. This is because the function $(x T /3 w /cr w ) is ap- 
proximately for x T /3 w < 0, while it is approximately 1 for x T /3 w > 0. As a u approaches 
infinity, the mixture weights become more spread out, and then f{y\x) becomes multimodal, 
with each mixture weight o;y(x T /3 a; , aj) above zero and much less than 1. These ideas are 
illustrated in Figure 1, which plots the mixture weights and the corresponding density of 
our model, f(y\x), for a range of cx w , given x T /3 w = .7, given samples of (^,cr|) from a 
normal-gamma distribution. As shown, the conditional density /(y|x) is unimodal when o u 
is small, and /(y|x) becomes more multimodal as increases. The level of multimodality 
in the data is indicated by the posterior distribution of a^, under the model. 

The Bayesian nonparametric meta-analytic model (8) is completed by the specification 



of a joint proper prior density ir(C) for the infinitely-many model parameters £ = (/3, 7, 

(n j)j ! L_ oo , (j), /3 W , cr u ), according to the joint prior distributions: 

fi 0j \al ~ n(fi Oj \0,al), j = 0, ±1, ±2, . . . ; (9a) 

(3 ~ n(/3 |0,^oo); (9b) 

(/3 fc , 7fc )|0 ~ n(/3|O,01O^OOl 1 ^).5 7fc (l--5) 1 - 7fe , k = l,...,p; (9c) 

0- 1 ~ ga(aJ 2 |a /2,a /2) (9d) 

ctq ~ un(cr |0,6 ) (9e) 

((3 U ,(T U ) ~ n p+1 (/3jO,^10 5 I p+1 )ga(a: 2 |l,l). (9f) 
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As shown, a diffuse prior is assigned to the overall mean effect-size parameter (3 Q . Also, 
as shown in (9c), we adapt the default spike-and-slab priors, to enable automatic covariate 
(predictor) selection in the posterior distribution of our model (George & McCulloch, 1997). 
Also, the gamma prior for the inverse dispersion parameter _1 has mean E(0" 1 ) = 1 and 
variance Var(0 _1 ) = — , with gu indicating the degree of 'belief in this prior (Nam, et al., 

2003). Furthermore, we specify uniform prior density un(<7o|0,&o) for the variance <Tq of the 
random intercepts {f^0j)'jL- oo , for a reasonably-large value b , as consistent with previous 
recommendations (Gelman (, Section 7.1). Alternatively, more generally, one may specify a 
half-t prior density for cxo- Most of the prior distributions in (9) represent default and rather 
diffuse choices of prior, which can be used in general meta-analytic applications where prior 
information is typically limited. Of course, if for a given meta-analytic data set, there is more 
prior (e.g., scientific) information available about one or more of the model parameters, then 
the prior distributions can be modified accordingly. 

Now, it is instructive to relate the parameters of the general normal meta-analytic model 
of equation (1) that are assigned a general prior density of equation (4) (Sections 2.2 and 
2.5), with the parameters of the Bayesian nonparametric meta-analytic model. Across both 
models, the linear regression coefficients f3, including the overall effect-size mean /3 , have 
the same interpretation of how the mean effect-size depends on covariates; the parameters 
7 = (7 1; . . . ,7 p ) T have the same interpretation as (random) indicators of which covariates 
are included as significant predictors of the models; the infinitely-many random intercepts 
(/^ 0j )°^„ oo of the Bayesian nonparametric model have the same interpretation as the level-2 
random intercepts ^ of a normal random-effects meta-analytic model; and the parameter 
(Tq has the same interpretation as the variance of the level-2 random intercepts. A key 
difference is that the Bayesian nonparametric model specifies a discrete infinite-mixture and 
covariate-dependent mixture distribution G x for the random intercept parameter fi , whereas 
a normal random-effects model specifies a normal and non-covariate mixture distribution G 
for the random intercept parameter. Moreover, for the Bayesian nonparametric model, the 
discrete mixture distribution G x induces (random) clusterings among the n study reports x/i 
(via the posterior distribution of the model), in terms of the random intercept parameter 
fi . This clustering serves a role that is similar to the pre-defined clustering of study reports 
in a 3- level normal random-effects model, in terms of the level-3 random intercepts n 00 , 
with the pre-defined clustering chosen by the meta- analyst. Note, however, that such a 
pre-defined clustering can be addressed via appropriate specification of covariates x for the 
mixture distribution G x of the Bayesian nonparametric model. For example, if the n study 
effect-sizes reports have pre-defined clustering into school districts, then the school districts 
can be specified as 0-1 indicator (dummy) covariates in x. Finally, the clustering feature 
of the Bayesian nonparametric model enables the model to account for correlations among 
the study reports i/i {i = 1, . . . ,n), lessening the need to specify a non-diagonal sampling 
covariance matrix S n to account for correlated effect-sizes. 

Following Bayes' theorem, the data likelihood L{V n ) = Yl7=i fiVil^ updates the 
prior density vr(^), to a posterior density 7r(£|D n ), given by equation (2). Then the posterior 
predictive density is given by equation (3), and recall that for a Bayesian meta-analytic 
models, this predictive density gives an estimator of the true effect-size density in the study 
population, given data T> n and covariates x of interest. Also, recall that for the task of 
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covariate selection, the kth covariate can be viewed gnificant predictor," when the 

posterior inclusion probability of the covariate, Pr[7 fc = l^n], is at least .5 (Barbieri & 
Berger, 2004). Finally, the level of multimodality in the density /(y|x; £) is indicated by the 
posterior distribution of cr w . 

Standard Markov Chain Monte Carlo (MCMC) methods, described by Kalli, Griffin, 
and Walker (2011), can be used to perform inference of the posterior density 7r(£|D n ) and 
posterior predictive density f n (y\x) of the model, as well as to perform inference for all 
relevant posterior functionals of interest. These MCMC methods are based on the use 
of strategic latent variables. Appendix B describes the full conditional posterior distribu- 
tions that are sampled at each stage of the MCMC algorithm, for the respective blocks 
((a%j)^=-oo> A 7) 4>i P u , a u>) °f the parameters of the Bayesian nonparametric model, as well 
as a block for the strategic latent variables that are introduced. 



4 Bayesian Model Assessment Methods 

Model selection is the practice of comparing different models that are fitted to a common 
sample data set, and then identifying the single model that best describes or predicts the 
underlying population distribution of the sample data. In meta-analytic practice, it is often 
of interest to perform model selection (e.g., Sutton, 2000, Section 11.7.3). For example, 
model selection is used in meta-analysis to choose between the fixed-effects and random- 
effects model (Borenstein et al. 2010), or to select important predictors of the effect-size in 
a regression setting (Higgins & Thompson, 2004). 

After M regression models are fit to a data set V n , the predictive performance of each 
Bayesian model m e {1, . . . , M} can be assessed by the mean-square posterior predictive- 
error criterion 

n n 

D(m) = ~ E n (F,|x 4 ,m)} 2 + £ Var^x^m) (10) 

t=l i=l 

(Laud & Ibrahim, 1995; Gelfand & Ghosh, 1998). Among the M Bayesian models that are 
compared, the model with the smallest value of D(m) is identified as the model which best 
describes the underlying population distribution of the given sample data set T> n . 

The criterion (10) is a standard criterion that is often used for the assessment and com- 
parison of Bayesian models (e.g., Gelfand & Banerjee, 2010). The first term of (10) measures 
data goodness-of-fit, and the second term is a penalty that is large for models which either 
over-fit or under-fit the data, as in other classical model selection criteria. The criterion (10) 
can be re-written as 

n „ n 

D(m) = (y~ yi) 2 fn(y\^i,m)dy = ^E„[(?/ i - y) 2 |x;,m]. 

i=\ •' i=l 

As we will show in the simulated and real data-examples in Section 5, the D(m) criterion 
provides a useful diagnostic to detect the presence of skewed or more multimodal effect-size 
distributions for the underlying study population, when the criterion is is used to compare 
between a Bayesian normal meta-analytic model of the type described in Sections 2.2 and 
2.5, against the more general Bayesian nonparametric meta-analytic model (Section 3) which 
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can describe a wider range of effect-size distributions, including skewed and more multimodal 
distributions. 

So for a given Bayesian model m G {1, . . . , M}, the estimate of D(m) is obtained by gen- 
erating posterior predictive samples y? red |x, (z = 1, . . . , n) at each iteration s = 1, . . . , S of 
the MCMC chain, and then taking 

S n 2 n 

s=l i=l i=l 

Also, above, the individual square-root quantities y/Di(m) (for i = 1, . . . , n) can be used to 
provide a more detailed assessment about a model's predictive performance, on the original 
scale of the effect-size y. So a large value of yjDi{m) would indicate that the observed 
effect-size yi is an outlier under the model. 

5 Illustrations 

In this section, we illustrate all the methods that we presented in Sections 2-4, through 
the meta-analysis of two simulated and a large real data set. In these analyses we compare 
the results of Bayesian nonparametric meta-analysis model, and the normal fixed-effects and 
normal random-effects models, in terms of parameter estimates. For the simulated data sets, 
we estimated the normal fixed-effects and normal random-effects models using full maximum 
likelihood (MLE), restricted maximum likelihood (REML), and fully Bayesian estimation 
methods. For the real data set, we present only the fully-Bayesian parameter estimates of 
the normal fixed-effects and normal random-effects models, for space considerations, as they 
were quite similar to the corresponding parameter estimates based on both MLE and REML 
estimation. For all applications, each covariate was previously z-standardized to have mean 
and variance 1, by taking = (x' ki — T L 'k)l®'x{k) for i = 1, . . . , n, given the mean and 
standard deviation (fi k ,cr X (k)) °f the original covariate data (x' kl , . . . , x' kn ). This enabled the 
estimate of the intercept parameter (/3 ) to be interpreted as the mean study effect-size, 
and enabled the (3 coefficients to be interpreted on a common scale. Also, for each data 
set, we used the D(m) predictive mean-square error criterion to compare the many different 
Bayesian models in terms of predictive performance, i.e., how well they accurately describe 
the effect-size distribution for the given underlying study population, given covariates of 
interest. 

For all Bayesian models that were fit to the simulated and real data sets, we specified 
the same prior densities for parameters that the models shared in common, in order to 
place the Bayesian model comparisons on a rather equal-footing. These prior specifications 
were generally consistent with the recommendations of the previous literature, as discussed 
in Sections 2.5 and 3. Specifically, for all Bayesian models, we specified the normal prior 
density vr(/3 ) = n(/3 |0,w — > oo) (with v = 10 5 ); for all Bayesian models no spike-and 
slab priors for the p covariates, we assigned diffuse normal prior densities for their slope 
coefficients, namely vr(/3 fc ) = n((3 k \0,v — >■ oo) (with v = 10 5 ), k = 1, . . . ,p; for all Bayesian 
models with spike-and slab priors for the p covariates, as given in equation 6 of Section 2.5, 
we chose for these prior hyper- variances v q = .001 and v i = 10, and Bernoulli prior parameter 
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of Pr(7 fc = 1) = |; for all Bayesian random-effects models that assume independent level- 
2 random intercepts, we specified the uniform prior density 7r(a"o) = un(o"o|0, 100) with 
large scale (100) for the level-2 variance parameter ex 2 ,; for all Bayesian 2-level random- 
effects model that allows for correlated random intercepts (Stevens & Taylor, 2009), we 
specified the rather non-informative prior for (a 2 ,, ip) that was presented in equation (5) of 
Section 2.5; for all Bayesian 3-level random-effects models, we specified the uniform prior 
density 7r(<7 o) = un(cr o|0, 100) with large scale (100) for the level-3 variance parameter 
(Tq . For the Bayesian nonparametric model, we specified diffuse or high- variance priors 
0" 1 ~ ga(o^ 2 |l/2, 1/2), pja* ~ n p+1 (/3j0, <x 2 10 5 I p+ i), and <xj 2 ~ ga(aj 2 |l,l), as shown 
in equations (9d) and (9f) of Section 3. 

For each application of a Bayesian model, the posterior distribution of the parameters 
were estimated on the basis of 200,000 MCMC samples, after trace plots indicated that 
MCMC samples of key model parameters and of the D(m) criterion stabilized and mixed 
well, and after 95% Monte Carlo confidence intervals (MCCIs) of these quantities attained 
half-widths that were small for practical purposes, i.e., that ranged between .01 and .05, 
and not exceeding .1. Again, these procedures accord with previous recommendations for 
checking the quality of MCMC-based posterior estimates, based on a single MCMC run 
(Geyer, 1992; 2011, Chapter 1). Our Bayesian nonparametric model was estimated using 
a program code we wrote in the MATLAB (Natick, VA) software language, for an earlier 
paper (Karabatsos & Walker, 2012). The Bayesian normal fixed-effects and random-effects 
models were each estimated using code we wrote in MATLAB. Finally, each of the 2-level 
and 3-level random-effects models were estimated using the nmle package (Pinheiro, et al. 
2010) of the R statistical software (R Development Core Team, 2012) 

5.1 Simulated Data 

Here we compare the estimation and predictive performance, between the normal fixed- 
effects model, the normal 2-level random-effects meta-analytic model, and the Bayesian non- 
parametric meta-analytic model, through the analysis of two simulated meta-analytic data 
sets. To keep the illustration simple, we consider no study-level covariates. As mentioned, 
for the normal fixed-effects and random-effects-models, we consider MLE, REML, and fully 
Bayesian estimation. 

The first "unimodal" data set was simulated under the condition where the true effect- 
size distribution (density) in the study population resembles a normal distribution, with the 
single mode at 1. In other words, this study population consists of one "significant" overall 
effect-size, located at this mode. Specifically, for this simulation, we sampled 35 effect-size 
study reports yt independently from a normal n(?/|l, <r 2 ) distribution, after sampling 35 effect- 
size variances <r 2 independently from a uniform un(a 2 |.05, .3) distribution. The simulated 
sample of 35 effect-sizes had a sample mean of .96, and a sample standard deviation of .54. 

The second "multimodal" data set was simulated under the condition where the true 
effect-size distribution (density) of the study population is bimodal, with modes at —1 and 
1. In other words, this study population consists of two significant overall effect sizes, which 
are respectively located at these modes. Specifically, in this simulation, we sampled 35 
effect-size study reports yi independently from a mixture of two normal distributions n(y \ — 
1, <7 2 )i+n(?/|l, cr 2 )i, with equal mixture weights of |, after sampling 35 effect-size variances 
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a\ independently from a uniform un(cr?|.05, .3) distribution. Note that each independent 
sample r/i from this mixture of two normal distributions is obtained by a draw from a normal 
n(y| — + — Zi), (7j) distribution, after samplings = 1 with probability ~, and otherwise 
taking Zj — with the remaining probability of 

For the unimodal simulation condition, the top part of Table 2 compares the results, 
between the Bayesian nonparametric meta-analytic model, against the normal fixed-effects 
and random-effects models, estimated under either MLE, REML, or fully-Bayes estimation. 
Not surprisingly, according to the D(m) predictive mean-square error criterion, the Bayesian 
normal fixed-effects and normal random-effects models, which assume normal effect-size dis- 
tributions (conditional on all model parameters), had better predictive performance than the 
Bayesian nonparametric model. This is because the data set was simulated specifically from 
unimodal and normal effect-size distribution (technically, a scale mixture of normal distribu- 
tions), and hence the simulating model most closely resembles the normal fixed-effects and 
random-effects models. In other words, the D(m) criterion correctly diagnosed a unimodal 
and symmetric effect-size distribution. Moreover, for the Bayesian nonparametric model, the 
5- number summary (min, 25%ile, median, 75%ile, and max) of the 35 individual yjDi(m) 
predictive residuals was (.3, .5, .6, .8, 1.5), which suggests that some of the 35 unimodal effect 
sizes r/i were outliers under this model. For the best-predictive, Bayesian normal random- 
effect model, the predictive residual yjDi{m) 5-number summary was (.3, .4, .5, .7, 1.4), and 
indicated smaller predictive residuals for the 35 simulated effect-size observations. 

Enter Table 2 

The top part of Table 2 also compares the mean effect-size /3 and 95% confidence interval 
estimates between the models estimated under either MLE, REML, and fully Bayes. For a 
Bayesian model, /3 is the estimated mean of the marginal posterior density of /3 , whereas 
a "95% confidence interval" is actually a posterior credible interval defined by the estimated 
2.5 and 97.5 percentiles of this marginal posterior density. As shown, the overall mean 
effect-size /3 and corresponding 95% confidence (or posterior) intervals are nearly identical 
across all the Bayesian models, and the non-Bayesian model estimated under MLE or REML. 
Also, the overall mean effect-size estimates /3 Q were correctly about 1, and the corresponding 
95% confidence (or posterior) intervals all correctly excluded zero, to correctly indicate a 
significant overall effect-size. 

Recall from Section 2.3 that, for a given Bayesian meta-analytic model, the posterior 
predictive density provides an estimate of the true effect-size density (distribution) of the 
underlying study population, The left panel of Figure 2 presents the posterior predictive 
density estimate for the Bayesian nonparametric model, for the simulated unimodal effect- 
size distribution. As shown, although the Bayesian nonparametric model has the worst 
predictive performance among all the Bayesian meta-analytic models, its posterior predictive 
density f n (y\xo) estimate (xo = 1) did seem to recover well the true effect-size distribution 
that was used to simulate the small (n = 35) unimodal data set. According to this estimated 
density, both the mean and median effect-size was about 1, with some skewness (—.2), and 
with 95% posterior predictive interval of (.3, 1.7), which also correctly indicates a significant 
overall effect-size. Moreover, for this model the marginal posterior mean (standard deviation) 
of the dispersion parameter was 1.5 (.4). 



18 



For the multi-modal simulation condition, the bottom part of Table 2 compares the 
results, between all the meta-analytic models. Not surprisingly, according to the D(m) pre- 
dictive mean-square error criterion, the Bayesian nonparametric model had better predictive 
performance than the Bayesian normal fixed-effects and random-effects models, which as- 
sume unimodal normal effect-size distributions (conditional on all model parameters). This 
is because the data were simulated from a bimodal effect-size distribution, and as mentioned 
earlier, the Bayesian nonparametric model is equipped to estimate distributions that are 
either unimodal or multimodal. Thus, the D{m) criterion correctly diagnosed the presence 
of a multimodal effect-size distribution. Moreover, the Bayesian nonparametric model, the 
predictive residual yjDi(m) 5-number summary was (.3, .4, .5, .6, 1.1). 

Also, the bottom part of Table 2 shows that, for all Bayesian and non-Bayesian models, 
the overall mean effect-size /3 was near zero, and the 95% confidence (or posterior) interval 
estimates included zero to incorrectly indicate a non-significant overall effect-size in the pop- 
ulation of studies. Hence, given that the true population effect-size distribution is bimodal 
at —1 and 1, it is clear that the overall mean effect-size and 95% confidence (or posterior) 
interval provide misleading summaries of the true overall effect-size distribution in the pop- 
ulation, This is true even though that these are quantities of primary interest under either 
a normal fixed-effects or a normal random-effects model. In other words, these estimates 
support a "non-significant" or zero overall effect-size, while completely missing the signifi- 
cant effect sizes of —1 and 1 that exist in the study population. This simulation result has 
important implications for real meta-analytic settings. In particular, if a real meta-analytic 
data set happens to be a sample from a study population that has a multimodal effect-size 
distribution, the overall mean effect-size and corresponding confidence interval may provide 
a misleading sample estimates of the effect-size distribution, which may not help contribute 
to the accumulation of evidence about the research issue of interest in the meta-analysis. 

As mentioned earlier, the Bayesian non-parametric meta-analytic model encourages the 
meta-analyst to perform statistical inference of the entire effect-size distribution, over and 
beyond the mean effect-size. For the multimodal simulated data set, the right panel of 
Figure 2 presents the posterior predictive density / n (y|x ) estimate (x = 1) of the model, 
which serves an estimate of the true overall effect-size density (distribution) of the study 
population. Even though the data set was rather small (n = 35), the model's estimate 
did seem to well-recover the true effect-size distribution that was used to simulate the data 
set, and the estimate even correctly identifies the two modes, at —1 and 1, which are the 
two significant effect sizes in the (synthetic) study population. Also, the for the Bayesian 
nonparametric model, the marginal posterior mean (standard deviation) of the dispersion 
parameter <p was 1.5 (.4). In contrast, for the 2-level random-effects model under REML 
estimation, the estimate of the true effect-size density (distribution) of the study population 
is n(j/|0, c^ + 1.2) for a given effect-size variance a , and similarly for the model under MLE. 
Also, the typical 95% confidence interval (—.4, .3) of the normal random-effect models 
actually covers values of the effect-size that have relatively low support in the true effect-size 
density (distribution) in the study population, as suggested by the right panel of Figure 2. 

— Enter Figure 2 — 
In summary, this simple simulation study, among other things, illustrates that the Bayesian 
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nonparametric meta-analytic model, can well-describe any shape and location of the effect- 
size distributions in the given study population. This is true, despite whether the population 
effect-size distribution is simply symmetric and unimodal (e.g., normal), or when the pop- 
ulation effect-size distribution is more skewed or multimodal. In contrast, while a normal 
fixed-effects model, and a normal random-effects model can each well-describe a simple sym- 
metric unimodal effect-size population distribution, the mean parameter of such a normal 
model can provide misleading inferences for the overall effect-size, when the underlying study 
population distribution is multimodal or skewed. 

5.2 Behavioral Genetics Data 

Antisocial behavior, which includes aggression, willingness to violate rules and laws, 
defiance of adult authority, and violation of social norms (Walker et al. 2003), is the most 
frequent reason why children are referred for mental health services in schools (Adelman & 
Taylor, 2010). Yet, it is the most intractable of all behavior and mental health problems, 
is extremely challenging to treat, and must be addressed across the lifespan (Moffitt, 1993; 
2005). 

— Enter Figure 3 — 

To advance understanding and treatment, many behavioral genetic studies have inves- 
tigated the heritability of antisocial behavior, by correlating ratings of antisocial behavior 
among monozygotic (MZ) identical twin pairs, and among dizygotic (DZ) fraternal twin 
pairs. In each study, the ratings were done either by the mother, father, teacher, self, or an 
observer. Then the heritability, defined as the proportion of phenotypic variance explained 
by genetic factors, is estimated by twice the difference between the MZ correlation and DZ 
correlation, for twins of the same sex. Specifically, for a given gender, suppose that umz 
monozygotic (MZ) identical twin pairs yield a correlation ~p MZ on an antisocial behavior 
trait, such as conduct disorder, aggression, delinquency, and externalizing behavior. Also, 
suppose that noz dizygotic (DZ) fraternal twin pairs yield a correlation 'p DZ on the same 
trait. Then the heritability of the antisocial behavior trait is estimated by: 

h 2 = 2(p MZ -p DZ ) (II) 

(Falconer & Mackay, 1996). This effect-size statistic (11) has sampling variance: 

a 2 = A[{{l-f MZ ) 2 /n MZ } + {{l-f DZ ?/n DZ }]. 

We identified 29 independent studies that provided the information necessary to estimate 
antisocial behavior heritability, for the n = 71 independent samples (study reports) of MZ- 
DZ twin comparisons (Talbott, et al. 2012). These studies were published during years 1966 
through 2009, and their full references are listed in Appendix C. There were 2-3 heritability 
estimates per study on average, and each study provided between 1 to 10 estimates. The left 
panel of Figure 3 presents the heritability estimates of antisocial behavior (i.e., the effect-size 
observations), stratified by gender, by rater type, and by the studies which were numerically 
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identified by publication year order (see Appendix C). The one slightly-negative estimate 
(—.06) may have resulted from sampling error (Gill & Jensen, 1968), as suggested by its 
relatively-large variance. 

Here, it is of interest to perform a meta-analysis of the studies, to learn about the overall 
heritability (effect-size) distribution for the underlying study population, as well as to learn 
how heritability changes with key study-level covariates. Again, analyses will be performed 
using the various fixed-effects and random-effects models under maximum likelihood estima- 
tion, and using our Bayesian nonparametric model. A total of 24 covariates were identified. 
They include publication year, the square root of the heritability variance (denoted SE(ES)) 
to provide an investigation of publication bias; indicators (0-1) of female status (49%) ver- 
sus male; ten indicators of antisocial behavior ratings done by mother (mean=.53), father 
(.06), teacher (.24), self (.15), independent observer (.04), and ratings done on conduct dis- 
order (.03), aggression (.40), delinquency (.10), and externalizing (.48) antisocial behavior; 
an indicator of whether a weighted average of heritability measures was taken within study 
over different groups of raters who rated the same twins (28% of cases, for which the ten 
indicator covariates are scored as group proportions), mean age of the study subjects in 
months (overall mean=119.8, s.d.=49.5); indicators of hi-majority (>60%) white twins in 
study (94%), zygosity obtained by questionnaire (80%), or through DNA samples (68%), 
study inclusion of low socioeconomic (SES) status level subjects (20%) and mid-to-high SES 
subjects (90%), missing SES information (10%), representative sample (85%), longitudinal 
sample (85%), and location of the study in terms of latitude and longitude. 

Given the structure of the data, with the n = 11 heritability (effect-size) estimate reports 
nested within the 29 studies, any one of at least 15 normal fixed-effects models or normal 
random-effects models can be considered for the purposes of meta-analysis. Specifically, they 
include: fixed-effects models; 2-level random-effects models, with level-2 random intercepts 
/Lt assumed to be independent via the specification of a multivariate normal n n (/x |0, <Tp n ) 
distribution, with either a structure that has each of the 71 independent samples of MZ- 
DZ twin comparisons defining its own group, or a grouping structure has each of the 29 
studies defining its own group; 2-level random-effects models that allow for dependent level- 
2 random intercepts via the specification of a multivariate normal n n (/j, |0, crpn + ip~M. n ) 
distribution (Stevens & Taylor, 2009) with the binary (0-1) matrix M n indicating which 
pairs of the 71 study reports belong in the same study; 3-level random-effects models, with 
the 71 heritability (effect-size) estimate reports (level 2) nested within the 29 studies (level 
2), and with random intercepts modeled respectively by the multivariate normal densities 
n n (/x |0, ctqI„) and n n (/x 00 |0, o"o I n ); with each model having either no covariates, or having 
all 24 covariates with no spike- and- slab priors for automatic covariate selection, or having 
all 24 covariates along with spike- and- slab priors for automatic covariate selection via the 
posterior distribution. Finally, we also analyze the data set using the Bayesian meta-analytic 
model, which includes all 24 covariates, and which assigns spike-and-slab priors for automatic 
covariate selection via the posterior distribution. 

Enter Table 3 

On the basis of the behavioral genetics data, Table 3 compares parameter estimates, be- 
tween the 15 Bayesian normal fixed-effects models and normal random-effects models, and 
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the Bayesian nonparametric meta-analytic model. Among all the models compared, the 
Bayesian nonparametric model attained the smallest value of the predictive mean-square 
error criterion D(m), by a relatively-large margin. Hence, among all the models compared, 
the Bayesian nonparametric model provides the best description of the study-population 
effect-size distribution that underlies the (sample) behavioral genetics data set. Again, the 
D(m) criterion provides a useful diagnostic to detect the presence of skewed or more mul- 
timodal effect-size distributions for the underlying study population, via the comparison 
of the Bayesian nonparametric model, and the 15 normal meta-analytic models. Also, for 
the Bayesian nonparametric model, the predictive residual y/Di(m) 5-number summary was 
(.0, .0, .1, .1, .3) over the 71 heritability effect-size observations yi. So none of the observations 
appeared to be outliers under the model. Also, as shown in Table 3, the mean heritability 
(effect-size) estimate was quite similar among all the 16 Bayesian models, ranging from .49 
to .51. Moreover, for all the Bayesian models assigned spike-and-slab priors, the posterior 
inclusion probabilities Pr[7 = l|2?7i] did not exceed .05 for all 24 covariates, well below 
the significance threshold of .5. Also, among all random-effects models, the posterior mean 
estimates (<7q, al , ip) of the random intercept variances ranged between .00 to .02. 

Over all models, the estimate (/3 ) of the grand mean study effect ranged from .05 to .17. 
In addition, all models that were assigned spike-and-slab priors concluded that none of the 
24 covariates were significant, according to posterior inclusion probabilities Pr[7 = did 
not exceed .05, and thus were below the significance threshold of .5. The lack of significance 
for the SE(ES) suggests that there is a lack of evidence of publication bias in the data, in 
terms of the mean of the effect-size. For the Bayesian nonparametric model, the marginal 
posterior mean (standard deviation) estimate of the dispersion parameter was .11 (.03). 
Among all random-effects models, the posterior mean estimates (<7 , £7 00 , i/j) of the random 
intercept variances ranged from .05 to .11. 

As mentioned, the Bayesian nonparametric model is an infinite mixture model that is 
able to account for all possible shapes and locations of effect-size distributions, including 
all normal distributions. Meanwhile, in terms of the predictive mean-square error criterion 
D(m), the Bayesian nonparametric model far-outperformed all other normal fixed-effects 
and normal random-effects models, which assume more strictly assume normal effect-size 
densities. These facts together suggest that the data set violates the assumptions of effect- 
size normality. For the Bayesian nonparametric model, the right panel of Figure 3 presents 
the posterior predictive estimate of the overall heritability (effect-size) distribution, for the 
underlying population of studies. This estimate is conditional on the covariates x = x = 
(1,0,..., 0) T , and also, it is conditioned on the minimum effect-size variance a\ of .0001 
over all 71 heritability reports, so that this distribution reflects information from a large- 
sample study. According to this estimate of the overall heritability (effect-size) distribution, 
there is some evidence of skewness (—.1), with the overall mean (.50) and median (.51) 
heritability (effect-size) being slightly different. Moreover, there seems to be two modes in 
this distribution, one at about the mean of .50, and the other at about .35, suggesting there 
are about two "significant" heritabilities (effect sizes) in the population, not only one. Upon 
closer inspection, the modes appear to be at heritability values of .38 and .51. So both modes 
can provide information that contribute to the accumulation of evidence about the overall 
heritability (effect-size) for the substantive researchers of behavioral genetics. 
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The first panel of Figure 4 shows the median (50%ile) and interquartile range (i.e., 25%ile 
and 75%ile) of the posterior predictive estimate of the heritability (effect-size) distribution, 
by SE(ES) (and by corresponding effect-size variance {SE(ES)} 2 = a 2 ). As shown, the 
median effect-size has a rather nonlinear relationship with SE(ES), but the lack of strong 
relationship of the median effect-size with SE(ES) further confirms a lack of evidence of 
publication bias in the data. 

Finally, another important issue in this area of behavioral genetics deals with the issue 
of informant discrepancy; that is, the issue of whether the heritability (effect-size) estimates 
are the same across raters, or whether they depend on rater type, and further the heritability 
estimates may also depend on ratee age. Recall that each of the 71 heritability (effect-size) 
estimates were obtained from ratings of antisocial behavior that were made by one of 5 rater 
types, namely, the mother, the father, the teacher, the self, or an independent observer. 
To address this research issue in detail, the remaining five panels of Figure 4 present the 
posterior predictive estimates of the effect-size distributions, conditional on covariates x that 
indicate rater type and ratee age, while controlling for all other non-constant covariates by 
setting them to zero, and while conditioning on the effect-size variance estimate a 2 = .0001 
that reflects a large-sample study. As shown in these panels, the heritability-age correlation 
seems to be sightly negative for all rater types. Moreover, the heritability distributions 
seemed to be similar among mother, father, self, and independent observer raters, while the 
teacher raters have noticeably different heritability distributions. 

— Enter Figure 4 — 

6 Discussion 

In this paper, we have proposed a Bayesian nonparametric model for meta-analysis, 
and demonstrated its suitability for meta-analytic data sets that give rise to asymmetric 
and more multimodal effect-size population distributions. As mentioned, the traditional 
normal fixed- and random-effects models, while frequently used for meta-analysis, are not 
fully satisfactory because they make empirically-falsifiable assumptions about the data. They 
include the assumption that the effect-sizes are normally-distributed (conditionally on model 
parameters). As we have shown, empirical violations of such an assumption can negatively 
affect the accuracy of a meta-analysis. 

In contrast to the traditional models, our proposed Bayesian nonparametric model flexibly 
accounts for all distributions of the effect-sizes, including all normal distributions. For each 
of the real data set considered in this paper, this flexibility enabled the Bayesian model 
to provide a better description of the underlying effect-size distribution of the underlying 
study population (according to the predictive mean-square error D(m) criterion). At the 
same time, the model provides a richer description of meta-analytic data, by allowing the 
data analyst to infer the whole distribution of effect-sizes, over studies, and to infer how 
the whole distribution changes as a function of key study-level covariates. Thus, the model 
goes beyond the mean as the measure of an overall effect-size. Furthermore, for the given 
meta-analytic data set at hand, the Bayesian nonparametric model automatically identifies 
important study-level predictors of the mean effect-size. 
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Appendix A. MCMC Algorithm for Bayesian Normal Meta-Analytic Models 

Consider the general normal meta-analytic model (1) presented in Section 2.5, which has 
parameters C, = ((3, 7, fi , n 00 , cr 2 ,, a^ Q , ip), data likelihood density given by (la), and which 
has a prior probability density of the general form (4). For such a model, the full conditional 
posterior densities to be sampled at each stage of the MCMC algorithm are as follows: 



1. 7r(/3|X? n ,,£ n ,/x ,/x 00 ) = n p+1 (/3|m„V,), with V, = (V" 1 + XtS^X^)- 1 , m„ = 

V*(X.T£~ y n ) _1 , X n = (xj) nx ( p+ i), y„ = (y 1 , . . . , y n ) J , ^i 00n = (/x 00f (i), • • • , /^oo*(n)) T > 
and y n = (y n - (/x + ju 00n )); 

9 P < 1IT) V pl \ n(/3 fc |0, Wl )Pr( 7fc = l) 

2 - Pr(7fc = ip„,S n ,P,^ .Moo) " 



k = l,...,p; 



n(/3 fc |0, vq) [1 - Pr( 7fc = 1)] + n(/3 fc |0, Vl ) Pr( 7fc = 1) 



3. 7r(/i 0i |D n ,/3,/i 0M(i) ,S n ,a2) oc n(/z 0i |0, a 2 l n + i/jM n ) H" =1 n(^|xT/3 + fi 0i + fi 00t{{) ,a 



2 ) 
1 1 ■ 



i = l,...,n; 



4- 7r(fi 00t \V n ,f3,fi 0i ,^ n ,a 2 00 ) oc n(^ oot \0, a 2 m ) n i: t(i)=t n(2/i| x J/3+/%+/%)i, t = l,...,T; 

5. 7r(o- 2 \V n ,v ,ilj) ocn n (/x |0,crgl n , + ^M n ,)(co/(c + ag) 2 )un(^| - <t%/{K - 1), erg); 

6. tt(ct 00 2 |I?„,^ 00 ) oc nil n(/i ot|0,(7g )7r((7oo); 

7. 7T(ip\V n , fj, Q , al) oc n n (/x |0, + ^M n )un(^| - o~l/(K - 1), «t§); 

8- f(y picd \V n , (3, /i 0i , /i 00i(i) , £„) = n(y prcd |x T /3 + /i 0i + Ai 00tW , Sf), to generate samples from 
the posterior predictive density f n {y\x). 

The full conditionals in Steps 1-8, follow from standard theory for normal linear models (e.g., 
Denison, et al. 2002). Each of the full conditionals in stems 3-7 can be conveniently sampled 
by using the Metropolis-Hastings algorithm of MCMC (e.g., Chib & Greenberg). For a model 
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which assumes V 7 = vl p+ i with v — > oo and thus 7r(7 = 1) = 1, Step 2 can be ignored. For a 
random-effects model that assumes i/j = 0, along with gamma prior ti((Jq 2 ) = ga (<7q 2 |ao, &o) > 
Step 7 can be omitted, and the full conditional in Step 5 can be rewritten as tt((Tq 2 \T> n , fi ) = 
ga (ctq 2 |ao + n/2, bo + Y^i=i lAi/ty > an d hence can be sampled directly. Likewise, for a 3-level 
random-effects model with gamma prior vr(<Tg ) = ga (<7Q 2 |aoo, ^oo), the full conditional in 

Step 6 can be rewritten as Tr(aQ 2 \T> n , fj, 00 ) = ga ^<To 2 |aoo + T/2, b 00 + J2t=i ^oot/^j ) an d 
hence can be sampled directly. For a Bayesian 2-level normal random-effects model that 
assumes <Tq = (i.e., ^ 00 = 0), Step 6 can be ignored. For a fixed-effects model which 
assumes ex 2 , = ctq = ip = (i.e., fj, = and /x 00 = 0), Steps 3-7 can be ignored. We wrote 
MATLAB (2011, The MathWorks, Natick, MA) code that implements the MCMC sampling 
algorithm. 

Appendix B. MCMC Algorithm for Bayesian Nonparametric Model 

The Bayesian nonparametric model has infinitely- many parameters £ = ((3, 7, (fi j)°? = _ oo , 
<j), f3 u , a u ). Using the general methods described in Kalli et al. (2010), latent variables are 
added to implement exact MCMC algorithms to estimate its posterior distribution. Specifi- 
cally, for the Bayesian nonparametric model, latent variables (v,i,Zi G Z)" =1 are introduced, 
such that model's data likelihood is defined by the joint distribution 

n 

Y[{n( yi \fi 0zi +xT/3,05?)I(O < Ul < ZmK^WPv,**)}. (12) 

i=i 

For each i, marginalizing over each of the latent variables in (12) returns the original likeli- 
hood for yi given in equation (8). So by using the latent variables, the infinite-dimensional 
model (8) can be treated as a finite-dimensional model, which, in turn, permits the use of 
standard MCMC methods to sample the full joint posterior distribution of the model. Given 
all variables, save the (z;)f =1 , the choice of each z% has maximum finite values ±iV max , where 
iV max = maxjfmaxj I(wi < 

Now we describe the MCMC sampling algorithm in detail. The algorithm proceeds by 
generating a sample from each of the following 8 full conditional posterior distributions, in 
turn. For simplicity in presentation, we assume the default prior distributions of the Bayesian 
nonparametric model. 

1. tt^-I ■ • • ) oc h(a*oj-|0, al) \[ r ■ n(y i |/i 02i + xj(3, <pa 2 ), 3 = 0, ±1, . . . , ±A max , 

given A max = maxjfmaxj I(itj < and independent uniform draws it, ~j n( jun(0, £1^1), 

i = 1, . . . ,n; 

2. Pr(zi = j\ • • • ) oc n(yi\fi 0Zi + xj(3, <fx%)uj(x]P u , a u )I(0 < u { < £\ Zi \)^, independently 
for j = 0, ±l,...,±A max ; 

3. 7r(/3| • • •) = np+i(/3|m*,<^V*), 

given V; 1 = diag(w -» 00; lO-^i.OOl-f 1 ^), . . . , lO^.OOl^-^+XTdiag^ 2 , . . . ,a ? ; 2 )X, 
m* = V*X T diag(a^~ 2 , . . . , a~ 2 )y n , with X the n x p matrix of row vectors (1, xj) (i = 
l,...,n),y = (yi,...,y n ) T ,/x z = (fj, 0zi , . . . , /i 02 J T , y n = (y-^ z ),andz = (z u . . . , z n ) J ; 



29 



, < i \ n^lO.lO^.OOl 1 -^) , j j_i -C 7 i 

4 - 7r(Tfc ■ ■ • = / fl , n im , — (o i n nmV independently for k = 1, . . . ,p; 
n(/9 fe |0,10) + n(/9 fe |0, .001) 

5. vrO^lZg = ga (0-^^/2 + n/2,&;), with 6; = a^/2 + (yTdiag(a~ 2 , . . . , d~ 2 y n - 
mTV*m,)/2; 

6. vr(/3J ■■■) = n p+1 (/3 w |m w „ cr 2 V w «), given V" 1 = 10~%+i + X T X, = V^(X T z,), 
z* = (2*, . . . ,z*) T , and independent draws z* ~ ind n(z*|xj/3 w , <7 2 )I(zj — 1 < z* < Zj) 
(i = 1, . . .,n); 

7. vr(aj 2 | • • • ) = ga(aj 2 |l + ra/2, 1 + (zjz* - m^V^m^)^); 

8- /(y prcd | x ? ' " " ) — n (2/i|/- t o.z+ xT A 0^ 2 )> given z =ceiling(2;*) and draw z* ~ n(z* |x T /3 w , er 2 ) 

The first full conditional posterior distribution listed above can be sampled using a random- 
walk Metropolis-Hastings algorithm, with independent normal proposal densities n(/z 0j |0, cr 2 ), 
j = 0, ±1, . . . , ±iV max . All the other 7 remaining full conditional distributions can each be 
sampled directly using standard Gibbs sampling methods, which are based on known re- 
sults for normal Bayesian linear models, and Dirichlet process mixed random-effects models 
(e.g., Denison, et al. 2002; Ibrahim & Kleinman, 1998; Escobar & West, 1998). We wrote 
MATLAB (2011, The MathWorks, Natick, MA) code that implements the MCMC sampling 
algorithm. 



Appendix C. Behavioral-Genetic Studies 

The following numerical list provides citations for all 29 behavioral-genetic studies, which 
were subject to a meta-analysis. The list is given in the order of publication year. Each item 
of the list presents the identification number assigned to the given study. The identification 
numbers of the studies are also shown in Figure 3. 

1. Scarr, S. 1966. Genetic factors in activity motivation. Child Development 37: 663-673. 

2. Owen, D.R., & Sines, J.O. 1970. Heritability of personality in children. Behavior 
Genetics 1: 235-248. 
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University of Texas Press. 

4. Rowe, D.C 1983. Biometrical genetic models of self-reported delinquent behavior: A 
twin study. Behavior Genetics 13: 473-489. 

5. Graham, P., & Stevenson, J. 1985. A twin study of genetic influences on behavioral 
deviance. Journal of the American Academy of Child Psychiatry 24: 33-41. 

6. Lytton, H., Watts, D., & Dunn, B.E. 1988. Stability of genetic determination from 
age 2 to age 9: A longitudinal twin study. Social Biology 35: 62-73. 
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analysis in Dutch twins. Behavior Genetics 33: 591-605. 
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in genetic and environmental effects on aggression. Aggressive Behavior 29, 55-68. 
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Child Psychology 32: 111-122. 
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The genetic basis of problem behavior in 5- year-old Dutch twin pairs. Behavior Ge- 
netics 34: 229-242. 
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and externalizing psychopathology. Psychological Medicine 35, 637-648. 
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483-491. 
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23. Tuvblad, C, Grann, M., & Lichtenstein, P. 2006. Heritability for adolescent antisocial 
behavior differs with socioeconomic status: gene-environment interaction. Journal of 
Child Psychology and Psychiatry 47: 734-743. 
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Aggressive Behavior 32: 308-320. 
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Tables 



Effect-size Description Effect-size (yi) Variance (cr 2 ) 

Unbiased standardized mean ^ - £ 2 . # f n u +n 27 + y\ . 

difference, two independent / (mi-ijgn+K-ifa ^ ™ l _ 2{nu+ ™ 2i) / 

groups (Hedges, 1981). V n.ii+n 2i -2 C — 1 4( nii+n2i _2)_l 

Fisher z transformation i . 1 + p» 1 

i log 



of the correlation p i . 2 1 — rii + 3 

Log odds ratio for two ~ 7 nm/ni 0i \ 1 1 1 

log -, + 



binary (0-1) variables. \n 01i /n m J n lu n m n 0li n 00i 

Table 1: Examples of effect size statistics, along with corresponding variances. 
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Unimodal Simulation 


Model 


D(m) 




95%CI 


^0 


BNP 


18 


1 


(-.7,1.2) 


.05 (.3) 


2L (Bayes) 


15 


1 


(.9,1.1) 


.04 (.1) 


FE (Bayes) 


16 


1 


(.9,1.1) 




2L (MLE) 




1 


(.9,1.2) 


.00 


2L (REML) 




1 


(.9,1.2) 


.00 


FE (MLE) 




1 


(.8,1.1) 




Multimodal Simulation 


Model 


D(m) 




95%CI 


^0 


BNP 


10 


.1 


(-1.8,3) 


4.5 (14) 


2L (Bayes) 


12 





(-.4, .4) 


1.1 (.3) 


FE (Bayes) 


16 





(-.2,.l) 




2L (MLE) 







(-.4, .3) 


1.1 


2L (REML) 







(-.4, .3) 


1.2 


FE (MLE) 







(-.4, .3) 





Table 2: For the two simulated data sets, a comparison of estimates between the new Bayesian 
nonparametric (BNP) model, various fixed-effects (FE) models, and various 2-level random 
effects (2L) models. For the level-2 random effects variance, the number in parentheses 
gives the posterior standard deviation. Bayes: Bayesian model; MLE: full maximum likeli- 
hood; REML: Restricted maximum-likelihood; D(m): Model predictive mean-square error 
criterion; CI: a 95 percent confidence interval (or Bayesian credible interval). 
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Mean ES 



Model 


D(m) 


A) 


;sd) 


a 2 (SD) 


a 2 00 (SD) 


5 (SD) 


BNP-ss 


.6 


.50 


(.03) 


.03 (.01) 


— 


— 


D2L-0 


4.8 


.49 


(.03) 


.02 (.01) 


— 


.01 (.01) 


2L-0, by MZ-DZ 


4.8 


.50 


(.02) 


.02 (.01) 


— 


— 


3L-0 


4.8 


.50 


(.03) 


.02 (.01) 


.02 (.00) 


— 


D2L-ss 


5.4 


.50 


(.03) 


.01 (.01) 


— 


.00 (.00) 


2L-ss, by MZ-DZ 


5.4 


.50 


(.03) 


.01 (.00) 


— 


— 


3L-ss 


5.4 


.50 


(.03) 


.01 (.00) 


.01 (.00) 


— 


2L-x, by MZ-DZ 


5.5 


.51 


(.03) 


.01 (.00) 


— 


— 


D2L-x 


5.5 


.51 


(.03) 


.01 (.01) 


— 


.00 (.00) 


3L-x 


5.5 


.50 


(.03) 


.01 (.01) 


.01 (.00) 


— 


2L-0, by Study 


5.8 


.51 


(.01) 


.00 (.00) 






FE-0 


5.9 


.51 


(.01) 








2L-ss, by Study 


6.0 


.51 


(.02) 


.00 (.00) 






FE-ss 


6.0 


.51 


(.02) 








2L-x, by Study 


6.0 


.51 


(.02) 


.00 (.00) 






FE-x 


6.0 


.51 


(.02) 









Table 3: For the school calendar data, a comparison of marginal posterior mean and standard 
deviation (SD) estimates of parameters, between the Bayesian nonparametric model, various 
normal fixed-effects (FE) models, and various normal 2- level (2L), dependent 2-level (D2L), 
and 3-level random effects (RE) models. Each two-level random effect model either clusters 
effect sizes into the 71 MZ-DZ twin comparisons, or clusters effect sizes into the 29 studies, 
x: all covariates included in model, no spike-and-slab priors; ss: all covariates included, with 
spike- and- slab priors; 0: no covariates in the model (only intercept term); D(m): Model 
predictive criterion; Mean ES: mean effect size. 



35 



Figure Captions 



Figure 1. Effect size density /(y|x) for o w = 1/20, 1/2, 1, 2, given x T /3 = .7 and given 
sampled values of (fij,a^). 

Figure 2. For the Bayesian nonparametric model, the posterior predictive estimate f n (y) 
of the effect size density (distribution), for the unimodal effect-size simulation (left panel), 
and for the bimodal effect-size simulation (right panel). 

Figure 3. Left panel: Heritability estimate, and its variance (+), for each of the 71 
independent samples of MZ and DZ twin comparisons, provided by 29 studies. A circle refers 
to females, square refers to males. Also, red refers to mother rater, black to teacher rater, blue 
to self-rater, magenta to observer rater, and green to mixed raters. The study identification 
number is located in the given square or circle, and this numbering is according to the order 
of publication date. Right panel: Bayesian estimate of the heritability distribution, over the 
universe of studies. Med: median; Var: variance; Skew: skewness; Kurt: kurtosis. 

Figure 4. For each of the 6 panels, the estimated posterior predictive median (solid line) 
and interquartile range (dashed lines) of antisocial behavior heritability (effect size), given 
values of a covariate, while controlling for all other covariates by fixing their values to zero. 
Mom: mother rater; Dad: father rater; Teacher: teacher rater; Self: self rater; Observer: 
observer rater. 
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